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BACKGROUND OF THE INVENTION 

1 . Field of the Invention 

The invention generally relates to the evaluation of programming language 
statements and in particular to a method and system for evaluating a 
programming language statement that includes a first and a second sub- 
statement. 

2. Description of the Related Art 

Presently, several techniques have been developed for evaluating 
programming language statements. Discoveries and the progress in the area 
of programming language theories and methods are fundamental. They have 
first a long term deep impact on other research areas, in particular 
computational semantics, type systems and programming models, and they 
strongly improve software development processes both in time and quality. As 
an example, the object oriented paradigm and its application is probably the 
most fertile advance of the last two decades, pervading software production 
and design. 

The separation of declarative languages versus imperative languages is 
commonly accepted as two different ways of solving computational related 
problems. With declarative languages, a programmer expresses what the 



solution is, while the imperative languages require the definition of how the 
solution is reached by specifying intermediate computation steps. 

Declarative languages are preferred when the application field is well defined 
and restricted, because they are simple and often rely on efficient algorithms, 
such as Prolog and Herbrand resolution, constraint solving system algorithms 
and the simplex algorithm. On the other hand, imperative languages are 
useful for solving general problems, and allow the mastering of execution time. 
However, imperative languages have a higher cost of providing a much more 
important amount of specification, and produce a complexity overload. 

There are today areas where programmers would need to combine both 
advantages of these two poles in a flexible and easy way. The operations for 
transforming structured data, such as XML documents, typically range from 
simple operations to very complex ones, involving different levels of 
abstraction. In such cases, programmers would prefer a common framework 
that enables the use of concise and declarative notations as long as this is 
possible, while reserving low level, fine grain imperative specifications to 
difficult cases. 

For such a framework, language theorists and designers are still looking for 
expressive, powerful, simple, clear and precise formalisms in order to capture 
the most fundamental operations. However, considering both semantics and 
typing, general solutions at the right abstraction level have not been proposed 
yet. 

SUMMARY OF THE INVENTION 

Given the problems of the existing technologies, it would therefore be 
advantageous to provide a method and system for evaluating a programming 
language statement capable of dealing with arbitrary structural complexity 
while preserving relevant type control and keeping the amount of basic 
building blocks reasonably small. 



It would further be advantageous to provide an evaluation technique where the 
assumptions made on the underlying language are explicit and restricted so 
that it can be embedded into existing or future, more specialized or general 
purpose programming languages. 

Further, it would be advantageous to provide an evaluation method and 
system with formal syntax typing and semantics on an unambiguous basis and 
an extensible theoretic framework independently from notations. Moreover, it 
would be advantageous to provide such an evaluation technique that enables 
the combination of declarative and imperative operations. 

The present invention has been made in consideration of the above situation 
and provides a method, and article of manufacture therefor, for operating a 
computer system for evaluating a programming language statement that 
includes a first and a second sub-statement. The first sub-statement is 
evaluated and an evaluation success result is determined if the evaluation 
succeeds. If the evaluation fails, a distinguished value is determined that is a 
value not included in the range of possible evaluation success results of the 
first sub-statement. Further, it is determined whether the second sub- 
statement is to be evaluated and if so, the second sub-statement is evaluated 
and an evaluation success result is determined if the evaluation succeeds. If 
the evaluation fails, the distinguished value is determined. The range of 
possible evaluation success results of the second sub-statement does not 
include the distinguished value. An evaluation result of the statement is 
determined depending on at least whether the evaluation of the first sub- 
statement succeeds or fails. 

The present invention further provides a computer system that is capable of 
evaluating a programming language statement and determining an evaluation 
result of the statement. The statement includes a first and a second sub- 
statement, and the evaluation result of this statement depends on whether the 
evaluation of the first and second sub-statement succeeds or fails. The 
computer system is capable of evaluating the first sub-statement and 
determining an evaluation success result if the evaluation succeeds, or a 
distinguished value if the evaluation fails. The computer system is further 



capable of evaluating the second sub-statement and determining an 
evaluation success result if the evaluation succeeds or the distinguished value 
if the evaluation fails. The distinguished value is a value that is not included in 
the range of possible evaluation success results. 

5 

BRIEF DESCRIPTION OF THE DRAWINGS 



The accompanying drawings are incorporated into and form a part of the 
specification to illustrate several embodiments of the present invention. These 

10 drawings, together with the description, serve to explain the principles of the 
invention. The drawings are only for the purpose of illustrating alternative 
examples of how the invention can be made and used and are not to be 
construed as limiting the invention to only the illustrated and described 
embodiments. Further features and advantages will become apparent from 

15 the following and more particular description on the various embodiments of 
the invention as illustrated in the accompanying drawings, wherein: 

FIG. 1 illustrates a computer system according to the invention; 

20 FIG. 2 is a general flow chart illustrating the pattern-matching technique of the 
invention; 



FIG. 3 is a more particular flow chart illustrating the pattern matching 
technique of the invention; 

FIG. 4 is a flowchart illustrating the evaluation statement process of the 
invention; and 



FIG. 5 illustrates an example of the hierarchy of types in a programming 
30 language based on the techniques of the invention. 



DETAILED DESCRIPTION 



Referring now to the drawings and particularly to FIG. 1 , which illustrates a 
35 computer system of the present invention, processing means 100 is provided 
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that is connected to an input section 110 and an output section 120. The 
processing means 100 can be of any type and has access to a program code 
storage 130 that stores the programming language statements and 
expressions that are operated on by the invention. The system further 
5 comprises in the example shown in FIG. 1, a statement evaluation section 140 
and a pattern matching section 150 including instructions that allow the 
processing means 100 to operate as discussed in more detail below. 



GENERAL SYNTAX AND OPERATIONAL SEMANTICS 

10 

Before going into the details of the invention, an example of a general abstract 
□ syntax for terms of the underlying language is provided, where the notation of 

J e encompasses imperative and declarative statements. Operational 

Q1 semantics is described by using the SOS style described in G. D. Plotkin's 

J 15 article "A structural approach to operational semantics", Technical Report 
^ DAIMI-FN-19, Computer Sciences Dept., Arhus university, Denmark, 1981, 

q considering "small step" transition in order to cover an eventual extension to 

concurrent languages with interleaving semantics. 

m 

j=? 20 The first set of definitions is now provided that describe basic constructs that 
can be considered as universal. 



e ::= n | s I true ( false constants: numerics, strings, booleans 

none | unit distinguished values 

e * e basic operations (*e {+, -, *, /}) 

if then e 2 else eg boolean choice 

var x = e.e variable declaration 

* variables 

e ::= true | false boolean literals 

e == e | e != e equality and difference comparisons 

The construction for the introduction of local variables 
25 var x=e in e 

is often noted 

let x=e in e 



in functional language communities. 

More sophisticated computation structures, e.g. sequencing, loops and 
assignment, with widely understood semantics are described through the 
5 following grammar extension: 

sequence 

simple assignment to a variable 
closure computation 

A transition system describes an interpreter of the language through a 
transition relation which defines the computation of any expression e into a 
new expression e', and is noted e -> e'. Several computation steps such as e 
_^ e ' e " can be abbreviated by using the notation e -» e". By definition, 
terminating computations will reduce eto a normal value noted v which cannot 
be reduced anymore; this particular computation step is noted e -^o v, and a 
derivation chain that reduces to a normal value is noted e -»o v. The relation 
-X) is formally described through: 

e ->o e' or e e ' 

The formal semantics of basic boolean, string and arithmetic operations (noted 
★ above) will not be described here, being considered as widely understood. 

The computing environment, as well as the use and notation of variables and 
references will now be described in more detail. The computing environment, 
noted S, is a mapping from names (of variables) into reduced values: 

s ={*:}■ 

25 with xj being a denotation for the mapping of a unique label x, into a unique 
reduced value v h Reference handling requires also another execution 
structure, a heap 

H-fri'} 

using the same notation provided that r denotes references. The introduction 
30 of recursive procedures or functions with non-lazy evaluation would simply 
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require a stack of mappings in order to handle local environments. The entire 
environment H, S is sometimes abbreviated by T. 



In another notation, the computing environment is noted T as the mapping 
5 from variable names into reduced values: 

r = w,...,»i} 



Further, T,x v is an abbreviation of H,Su{x v }, or ru(x v }, respectively, and 
similarly r, r v is an abbreviation of H u {r v }, S. The transition relation 
becomes now The -> T'Y-e' in the general case, although invariant execution 
structures can be skipped for clarity when required. This relation means 
"expression e computed in environment H, S becomes e' in environment 
H\S' n , understood that H' and S" can be equal to or different from H and S. 
The following equations describe the semantics of variable access [var] and 
declaration [d1,d2]: 

T,x v \-x->r,x v \-v [var] 
r h e x ->• r I- e' x 



T h \arx = e x .e 2 -*Vh vara; = e / 1 .e 2 
rhei-^o r H v 



[dl] 



T h vara; = ei.e2 -+ T,x v H e 2 



[d2] 



Reference creation and dereferencing involve intermediate distinguished 
values r (references) on which no other operations are defined, as opposed to 
pointers and pointer arithmetics. 

rhe^r'he' ri-e-j-r'he' 



r {- @e -> V h @e' L J r He I" He' 

r h @u -J- T, r v h r [ref2] T, r" Hr -> T, r 11 {- t; [dref2] 



In [ref2], it is to be mentioned that a new unique reference r is created in the 
Heap and that this reference is returned as result of the computation of @ v. 

In the testing operation, environment modifications are allowed during 
evaluation of the boolean part. This enables the use of a matching operation 
as a condition for the test. 



-7- 



r |- if ei then e 2 else c 3 -> T' h if e\ then e 2 else e 3 
if true then e 2 else e 3 e 2 [ifl] if false then e 2 else e 3 -* e 3 Iif2] 

The semantics of the basic numerical operations +, /, *, - is well known in the 
art. The + operator is polymorphic, i.e. it applies in various semantics to 
5 numerals, strings, sequences, concatenations, multisets, disjunctive unions, 
i.e. mi+ms-m^mi, and dictionaries (non-commutative, right priority). 

PATTERN MATCHING 

10 The pattern matching technique, which will now be described in more detail, 
evaluates a first code structure that represents an expression, analyzes a 
second code structure that represents the filter to be matched by the 
expression, and filters the value determined by evaluating the first code 
structure according to filter characteristics determined by the second code 

15 structure. The first code structure, hereafter denoted "data structure" and the 
second code structure, hereafter denoted "pattern structure", are constructed 
symmetrically so that matching operations can be specified on various and 
arbitrary complex data structures, such as strings, sequences, sets, 
dictionaries and records, but also on trees, DAGs (Directed Acyclic Graphics) 

20 and general graphs. This will be shown in more detail hereafter. 

In the following, it is assumed that it is possible to declare local variables which 
will potentially be assigned to parts of or to the entire filtered structures. 
Expressions, such as e, e l can be literal constants, variables or basic 
25 operations, such as e + e, e*e. References are considered because they 
allow the sharing of substructures and the modeling of DAGs and graphs. 

Pattern matching operations make it possible to check a given expression efor 
whether the expression contains a structural pattern, and to extract a part of its 
30 content, in a single operation. A pattern matching operation is made up of 
three parts: The first part is an expression e against which the filter is to be 
checked. This part is called the subject. The second and third parts are the 
pattern-matching operator # and the filter f. The subject can be a variable, a 
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constant, or a more complex expression that will be evaluated prior to 
matching. 

Thus, a pattern matching operation looks like 
e#f. 

These operations return a boolean value. If the operation succeeds, meaning 
that the expression matches the filter and, optionally, that variables have been 
assigned to a part of the subject's content, then it returns true. If it fails, 
meaning that the expression did not match the filter structure or that it did not 
contain a given value, then false is returned. 

By convention, normal values are noted vor n, s if they are numeric or strings, 
respectively. As shown above, a derivation chain that reduces to a normal 
value is noted e — »o v. The general semantics of the matching operation e # f 
requires in the following order: the evaluation (step 300 in FIG. 3) of the left 
operand e (normalized to v), the evaluation (step 310) of the right operand, i.e. 
the filter / , and the application (step 320) of the filter to the normalized value, 
which returns either true or false. It will be appreciated that the term "value" 
does not necessarily relate to a numeric value as the expression may be a 
non-numeric expression. Moreover, the environment might be modified by the 
matching operation, whatever result is obtained. Normalized filters are noted 
as bold letters. The formal semantics of matching operation is twofold. The 
first stage computes a reduced form first for the subject ([matcM] iterated) and 
then for the filter ([f-match1] iterated): 

The last stage, i.e. the filter application itself, depends on the filter and data 
structure considered and will be defined case by case in the following. 

Examples of structure constructors are now presented in order to illustrate the 
symmetry between them and the corresponding filter constructors described 
below. The structure constructors and filter constructors are indicator 
elements indicating the respective data type: 



[e x , e 2 , ...} Multiset: a collection type in which 

elements are not ordered and which can 
contain multiple instances of the same 
value 

\e x , e 2 , ...] Sequence: an ordered collection of 

elements of a common type 

(e x ,e 2 ,..) Tuple: an ordered collection of elements 

which has a fixed size. Elements need 
not be of a common type 

(namej = e,,name 2 = e 2 , ... Record: an unordered collection of 
named elements 

{key j = e x , key 2 = e 2 r ...} Dictionary: an unordered collection of 
elements that are each accessed by a 
key. Each key must be unique 

@ {e) References: a reference to a value (itself 

computed from an expression e) stored in 
memory 

! De- referencing operator, used to access 

a referenced value 



Further, there is provided a + operator that can take different semantics 
depending on the data structure, e.g. arithmetic addition, string concatenation, 
5 set union, etc. 



Filters are expressed using a small number of operators and can describe 
many different structures. Besides this, filters closely resemble the structures 
to be matched, making them rather easy to specify. Filters can match 
10 sequences, multisets, tuples, records, dictionaries, strings and any 
combination of these structures. All filters are built using three basic filter 
operators, in addition to filter constructors which use the same notation as their 
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data constructor counterparts. The three basic filter operators are the test 
operator, the existence operator and the assignment operator. 

The test operator, noted % in the present example, is used in conjunction with 
5 a constant, variable or complex expression, which will be evaluated prior to 
matching. It tests the occurrence of the value given by its operand at some 
point in the structure. For instance, 

e # %' circus' 

tests whether e is a string equal to 'circus' or not. 

10 

e # [%(2+2) ,%3] 

checks that e is a sequence having one item of value 4 and one item of value 
3 in that order. 

15 The existence operator, noted ?, is defined as 'match any element'. For 
instance, 

e # <?, ?> 

matches any tuple that has exactly two elements, like 

< ' circus' , 47> . 

20 

The assignment operator, noted ?x, where x is a variable name, is used to 
extract a part of the subject and assign this part to x. For instance, 

e # <%' circus' ,?, ?y> 

will succeed if e is a tuple containing three elements. The f irst element has to 
25 be the string 'circus', the second one can be anything, and the value of the 
third element is assigned to variable y. The same pattern matching operation 
on a sequence would be very similar: 

e # [%' circus ',?, ?y] 

changing only the constructor from tuple to sequence. 

30 

In addition to these basic filter operators, there is provided a concatenation 
operator, noted ++, that combines compatible elements. It can be seen as the 
dual operator of the concatenation operator (noted +, see above) of data 
structures. For instance, the string 'circus' can be thought of as the 
35 concatenation of three substrings 
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' ci' +' r' +' cus' 



In a symmetrical way, the filter appears as 

e # % 'ci' ++ %'r' ++ %'cus'. 

For finding strings containing character Y, the filter 

? ++ %'r' ++ ? 

can be used, which means zero or more elements, plus the letter Y, and a 
further zero or more element. This operator can be used with any structure, 
making it possible to take other elements into account. For instance, e # [?x] 
matches only sequences consisting of exactly one element, and assigns this 
element to x. To express the fact that a sequence e must contain at least one 
element, without any ordering, ? combined with the ++ operator can be used: 
e # ? ++ [?x] ++ ? . 

For illustrating the advantages of the present structural pattern matching 
scheme, more complex examples are now given for strings, tuples, 
sequences, records, multisets and dictionaries. 



20 e # ? ++ %'r' ++?x 

will match any string containing V and assign the substring beginning after the 
first occurrence of Y to variable x, if e='circus', then x will be assigned 'cus' ; if 
e='red', then xwill be assigned 'ed'. 



will match any sequence containing one record as the first element. This 
record must contain a field f1] the value of this field will be assigned to variable 
x. 



30 [<f 1=10>, <f2 = ' s' >, <f 1=3>] # [<f l=?x>] ++?y 

will succeed; xwill be assigned 10 and y will be assigned [<f2='s'>, <f1=3>]. 

e # {? ++ [<?,?x>] } ++ ? 
will match any multiset containing at least one sequence whose elements are 
35 tuples made up of two elements and will assign the second element of the last 
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tuple of a sequence in this multiset to variable x. It will select the last tuple of a 
sequence because the pattern structure includes ? ++ [<?,?x>] instead of 
[<?, ?x>] ++ ? which would have selected the first tuple. 

5 {'ab', 'cd', 'ef'} # {%'ef ' , %'ab' , %'cd' } 

will match since the order is irrelevant in sets. 



01 



Pattern matching can further be used to retrieve values from a dictionary. 
Suppose a dictionary which maps some strings to some sequences, 
10 diet # {'kl'=?} ++ ? 

will succeed if diet contains a key 'k1\ and 

diet # {'k2'=?x} ++ ? 
will succeed if diet contains key 'k2' and will assign the corresponding 
sequence to variable x. 

15 

Since pattern matching operations return boolean values, they can be 
composed using logical connectors. It is then possible to express filters which 
succeed only if the subject does not match the filter, or complex filters. There 
are three composition operators, "not", "or" and "and". 

20 

"not" will succeed if the subject does not match the filter. For instance, 

e # not (? ++ %'r' ++ ?) 
will succeed only if string e does not contain the character Y. 

25 "or" takes two filter arguments, which are matched from left to right. For 
instance 

e # %'c' ++ ? or ? ++ %'r' ++ ? 

will succeed only if string e begins with the character 'c' or if it contains the 
character Y. 



30 



"and" works in a similar way as "or". It succeeds if the expression matches 
both filters. For instance, 

e # <?, ?> and ?x 
allows checking that e is a two-element tuple and assigns e to x. 
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There are also provided "Kleene" operators * and + that work with the known 
semantics, "zero or more elements" and "at least one element", respectively. 
For instance, 

e # %'abb' ++ (%'C)* ++ %'dee' 
5 will match any of the following strings: 'abbdee', 'abbcdee', and 'abbcccdee'. 
The same filter using the second operator, 

e # %'abb' ++ (%'c') + ++ %'dee' 

will match the same set of strings except 'abbdee'. The Kleene operators can 
be used with sequences, sets and dictionaries. 

10 

As shown above, the described pattern matching technique enables the 
symmetrical constructions of both pattern matching structures and data 
structures, so that they reach the same arbitrary level of structural complexity. 
Further, the matching operation is invoked explicitly, through a dedicated 
15 operator. The general form of the structure to be matched is tested as well as 
the contained (sub-)structures and the assignment of part(s) of the matching 
structure to variable(s) of the execution environment. 

The technique relates to a number of fields including those of language 
20 construction, control abstractions and transformations models, rewriting 
systems theory, term rewriting systems, transformation languages or systems 
for compilation and language processing, structured document transformation, 
tree pattern matching, explicit or automatic document transformation systems, 
and so on. 

25 

Further, the technique may serve for building transformation models which are 
less abstract and more general, than rewrite systems, which perform implicit 
pattern matching and apply built-in strategies for rule application. The 
technique is on the other hand abstract enough to simplify and extend general 
30 purpose programming languages. Thus, the technique may play an important 
role in the design of new transformation techniques or the extension of existing 
ones. 



Four basic topics are addressed in a single approach different to and 
35 independent from existing ones that are currently and conceptually less 
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efficient: the definition of a data model, the definition of matching operations, 
the definition of a transformation model, and the integration of the 
transformation model in a programming language. 



5 The definition of a data model is to give a precise form to the notion of data 
structures, which need to be rich and clear for the programmer but at the same 
time kept as simple as possible. The computational properties of each kind of 
structure should be stressed to reach completeness at minimal cost. For 
instance, sequences are ordered structures that are useful for handling stacks, 
10 fifo's, or to memorize intermediate nodes when walking through a tree, 
whereas multisets are useful for managing "bags" of unorganized data. 

The technique described above proposes a "universal" data model based on a 
synthesis of the most commonly used data types. These are chosen in order 
to avoid redundancy and to enable the construction of heterogeneous 
structures of arbitrary complexity while being kept simple and explicit. 
However, applying the technique does not require using exactly the proposed 
type set, but only applying the symmetrical construction principle, both to data 
structures and to pattern structures. 

The definition of matching operations is clearly related to the previous point. 
This covers three fundamental issues: identifying and testing the "form" of the 
structure, (step 200 in FIG. 2); check the value of the structure (or a part of it) 
(step 210); and extracting a part of the information stored in the structure (step 
220). 

The technique allows for building pattern matching structures in such a way 
that the required form is made explicit, thus clear to the programmer who 
specifies it. Parts of the subject structure which are not of interest are 
30 abstracted through a basic existence filter noted "?". Values inside the 
structure can be checked as equal to arbitrary complex values, thanks to a 
basic testing filter noted "%", e being a potentially complex expression. 
Finally, information located in the subject structure can be extracted and 
copied to any variable that is available in the execution environment, by using 
35 a basic assignment filter, noted "?x", where x is a variable name. 
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The definition of a transformation model starts from the finding that matching 
operations are just intermediate steps of realistic structure transformation 
processes and should therefore be put in a more general context: How are 
transformation steps described, chained and ordered? How are structures 
scanned and what is the result of the overall transformation? How is the 
contextual information extracted that is used during matching operations: is it 
simply transferred into the output structure(s) and/or does it drive the 
transformation itself? 

The explicit invocation of the matching operation, according to the present 
technique, allows one to see it as a boolean evaluation. The transformation 
model can therefore take an useful form, depending on the requirements. The 
transformations can cascade matching operations by using rules like e-t # f 1 =» 
e 2 # f/2 or composed sequentially or through complex networks of "if- 

then-else" statements, possibly comprising computation on the extracted 
context. Output structures, the result of the transformation, are constructed in 
a coherent way by using the available data models and contextual information, 
possibly transformed by other computational means. 

Depending on the definition of the transformation model and the expected 
computational power of the transformation, one can consider either to extend 
the pattern matching with language constructs to specify internal computation, 
as a complement to matching operations, or to integrate the transformation 
model itself into a more general programming language, as an extension 
which increases expressiveness. Both options are made possible by this 
technique, thanks to the explicit matching operator and the few general, and 
modular, assumptions regarding the underlying language. 

Considering the above observations, it will be appreciated that the described 
pattern matching technique is in particular suited for supporting general typing 
mechanisms. Type checking improves the global reliability of programming 
languages, and provides efficient means for runtime optimization. 
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Pattern Matching Syntax and Operational Semantics 

As noted above, the general abstract syntax uses e for expressions and / for 
pattern matching filters. Matching operations are invoked through the operator 
5 e # f where the left expression e is the subject and the right operant f is the 
filter or pattern. 

There are provided four basic operators plus combinators (or filter connectors): 



f ::= ? existence test 

?x existence test and assignment to variable x 

%e equality check 

@ f ref. filter 

f ::= f++ f filter composition 

f | f kleene-like operators 

f and f\ for f | not f boolean (ordered) connectives 

The first two filters ? and ?x are under normal form. The computation of the 
equality check and of reference filters requires the computation of their 
embedded expression: 

*£«2 _*-« 57W [f " el 



By using a meta symbol ★€{++, and, or }, we define a family of equations 
valid for all binary combinators: 

ff.ieffl h ~* f ' 2 [f-rigfat] 



20 The reduction of the kleene-like operator f* preserves the structure of the filter 
([f*]) whereas the computation of f is a pure rewriting ([f + ]): 



In the semantics of filter application (matching), v and w denote reduced 
25 terms: 
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v f ? -» true [m-free2] I\ a:" I- w ft Ix T, x w h true [m-freel] 

r,r"r-i>tf-»iy Htnie t>t = «2 -» true/false 

r, r v h r | @f -+ F, r" h true L 1 »i f %« 2 -4 true/false 



It is to be mentioned that [m-%] uses again the structural equality on 
structures. This last is defined for all considered structures as s* = s 2 iff s* c s 2 
5 and sj 2 s 2 . The inclusion itself is straight forward for strings, sequences and 
multisets; for dictionaries, it corresponds to the inclusion of key/value pairs. 



The composition combinator ++ applies only to strings, sequences, multisets 
or dictionaries. It can be abstracted over these various data structures, noted 
s, by using the + operator which models concatenation for strings and 
sequences, disjoint union for multisets and non-commutative union for records 
and dictionaries. As keys must be unique in a dictionary, the key/value pairs 
of the right hand operand override the key/value pairs of the left hand operand, 
if required. This does not impact the semantics of the related matching 
The following definition uses again the equality relation on 



operation, 
structures; 



[m-faddl] 

3^1^2 1^1 + 52 = 5 si g ft -» true s 2 it f 2 -* true 
s jj fi++f 2 —j- true 



«i i fi - 



Vsi , s 2 such that s\ + 52 = s < 



si ff a - 



s I f 1+ +f 3 - 
ue 8 a I f a - 



false 
false 



s J fn-+f 2 -> false 



[m-fadd2] 
[m-fadd3] 



Similarly, in the following definition which applies only to strings, sequences, 
20 sets and dictionaries, the term e is an abstraction over empty string, empty 
sequence, empty set and empty dictionary: 

[m*] 

ifs^e 3s!,s 2 1 s = Sl + s 2 siff-Hrne s 2 jj f * true 
s 8 f * true 

e f f * -f true [m*b] 



In the following equation, the order of operands is important, and computing 
25 environments can be altered by unsuccessful operations. This enables more 
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efficient implementations since no specific processing is required in order to 
preserve the original environment: 

[m-andl] [m-and2] 
r I- v it f i I" h true r ; I- t; | f 2 -» T" h true T t- y ft f t r I- false 

r f- vjjfi and f 2 -> T" h true r h t; |f fi and f 2 -> I" f- false 

[m-and3] 

rt-ttftfj true I" h a ft f 2 -» T" h false 

ri-off 1 andf 2 -*r"hfalse 



5 A similar behavior is defined for "or" filters: 

[m-orl] [m-or2] 
T h v ft f a -> T' \- true r I- v ft f, -» r H false r I- t; ft f 2 -> r" h trne 

r h « | f i or f 2 T' I- true r I- v ft f, orf 2 -» T" H true 

[m-or3] 

r h v j ft -)• I" t- false rHt>8f 2 ^r"hfalse 
T h t; S fi or f 2 -+r"h false 



15 



Similarly, the "not" filter is: 



r I- t) f fi -» r H true/false 

— ; m-not 

r h w | not fi -> h false/true 1 



Of course, backtracking might be convenient in cases where the matching 
failed and still let the environment changed. Further, a solution by the mean of 
an explicit restoring operation v(f), may have semantics 

/_>./' r , rWt> ftf-»r'htnie . . r i-^ilf-^ri-false 



The symmetrical construction of data and filter structures will now be 
described in more detail in the examples of tuples, sequences, multisets, 
dictionaries and records. 



20 Tuples allow to handle heterogeneous structures of fixed length. The present 
description will present only constructors. Other common operations on tuple 
structures are: access to items through indexing notations such as <[/], in order 
to fetch the / h item, or assignment f[/] := e, in order to set the / h item, provided 
that /" corresponds to the actual size of the tuple. Tuples are particularly useful 
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for computing cartesian products such as done in relational algebra and 
related languages: 

e ::= (e 0 <? k > tuple construction (k+ 1 items) 

/ "= (fo, • /k> filter for tuples of /f + 1 elements 



In the following part, e is noted as a shorthand for e 0 , e k . The computation 
of a tuple (or a filter tuple) is defined by the successive computation of all 
subterms, in the order of occurrence. A tuple or filter tuple is reduced when all 
subterms or sub-filters are reduced: 



10 



...HL...4 » * n 



Matching for tuples is performed by matching all subfilters to corresponding 
substructures. If the subject v is not a tuple, or if cardinalities are different, the 
operation returns "false" ([m-tuple]): 

r i- <tH> ,...,«*> t (fo, ... , ft> -* r {k) h true 

It is to be mentioned that potential modifications of execution context are 
propagated through the whole matching process. 

Sequences, multisets and dictionaries are defined as shown below: 

e ::= [e 0 , e k ] i {e 0 , e k } (ordered) sequences, multisets 

::= {<?o= eo, e k =e k '} dictionaries (unordered key/value pairs) 

::= [] I {} I {=} empty sequence, multiset, dictionary 

/ ::= [f 0 , ■■■>/*] I {/ 0 , • /*} sequence/multiset filters (k>0) 

{fo = U fk = A) dictionary filters (k > 0) 



[v 0 , . . . , c,-, . . . , efc ] [v 0 , . . . , et, . . . , & k ] 



[k,...Ji,...,h]^[t 0 ,...,f! f k ] 



[f-seq] 
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Equivalent equations for sets, called [set], [set2], [f-set] and [f-set2], are strictly 
similar to the previous one: 

fi-+fi 



{V 0 ;... J e i ,... ) e fe }->{.;a,... ) e< ) ... J e,} [s6t] {f 0) ... ... ,/*} ^ ft, ...,//,..., /*} ^ 

When matching sets S {0k] denotes the set {v 0> v k } and SF 10 ^ the filter {f Q , 
f k ). If is [0,k], the s [0,kHi} is equivalent to {v 0 , v-n, v M , v k }, and similarly 
for Sfi 0 *™. Using this notation, the matching over sets is recursively defined 
as: 

r P 5t«>.« ft nSFto.*] r „ h true M 

This equation does not express any ordering in exploring the search space. It 
just requires that each value in S matches a corresponding filter in SF. Note 
that the computing environment can be modified throughout matching 



oi operations. 



Dictionaries are computed by successive evaluation of key/value pairs, 
following the occurrence order: 

e, -» e" 

{^0=^0, - - . , e,=e' i} . . . , e k =e' k } {v 0 =v' o , e?=e<, . . . , e ft =e'J ^ 
e'i — )■ e" 

{t>o=»o, . . . , Vi=se'i , e*=e' fe } {v 0 =»o» • • • > «i=e". • • • , 

20 Computation of filter dictionaries is strictly similar, and therefore, 
corresponding equations [f-dic], [f-dic2] will not be shown here. 

When matching dictionaries, & 0M] denotes the dictionary {v 0 =v 0 ', v k = v k } 
and D/= [0,kl the filter {f 0 = f'o, ■■•,k= f'k}- Using this notation, the matching over 
25 dictionaries is recursively defined as: 



E [0, fcj such that t T' 1- i>J ft ft 

[ r" h pi°M-{i} i DF 



-f T' I- true 

-+ T" h true 

. - - J gg^Mi) _» r w j- true r 

T h £>[°.*] ft DFP.*J -» T'" 1- true t™" 4 "^ 
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This equation does not express any ordering in exploring the search space. !t 
just requires that each key/value pair matches a corresponding filter-key/filter- 
value pair. Note that the computing environment can be modified throughout 
matching operations, and that keys are matched prior to values. 

5 

Records are unordered and heterogeneous collection of values which can be 
statically designed by a name called "member". Standard operations on such 
data structures s are member access, e.g. a := s.m, or member assignment, 
e.g. s.m := 10. Record provide useful mnemonics in order to select a part of 
10 complex structures and are well known for their expressive richness in data 
modeling: 

e ;:= {m 0 = e ^,..., m k = e k > records (unordered member-name/ 

value pairs) (k> 0) 

/ : : = {m 0 = f 0 ,...,m k = f k ) record filters {k > 0) 



The reduction of data and filters follows the order of occurrence: 

e» -» e< 

15 <m 0 =Wo,.-. ,m f =e,-,... ,m, k =e k ) -> (m 0 =w 0l ... ,mj=e<,... ,m fe =e A > 



[e-rec] 



Filter records are similarly defined: 



<m 0 =f 0 , • - • , rrii-fi, m k =f k ) -» <m 0 =f 0 , . . • , - - • , m k -f k ) 



[f-record] 



Matching is not order sensitive, i.e. the ordering of filters can differ from the 
ordering of the subject, but filters are applied through their definition order. 
Member names must be exactly identical: 

3 distinct io, . . . , € [0 • - -k] such that 

{m' 0 = m io and T h v io ft f 0 -* W h true 
m' k = rn ik and V^ k ~^ h v ik ft f fe -» h true 
T h (m 0 =v 0) m k =zv k } | (m{,=f 0 , - - . , m' k =t k ) -»■ IW h true l m_recor 



25 The matching fails if one of the sub-matching fails: 
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3 distinct i 0 , . . . , ij € [0 • • • *], j < k such that 

m' g - m io and r h v io f f 0 ->■ I* 1 ) h true 

;• = mfj and T^" 1 ) h jj tj -> rgj j- false 
F h (roo=tfo, • • • , S (m^fo, • • • , r«) h false L m " recor 



Recursive Filters 

5 

Recursive filters and the "do" operator will now be described in more detail. 

Li 

O Recursive filters enable the filtering of trees. For instance, 

fk e # rec F= <% ' plus ' , F , F> or <%'minus' , F, F> or <?> 

H 10 succeeds if e is a tree with nodes that are labeled by strings 'plus' and 'minus' 

and with leaves that are tuples containing a unique element (of any structure). 

Q 

hi The "do" operator takes a filter as the left argument and an instruction (or 

Jl sequence of instructions) as the right argument. Instructions are executed 

O 15 only if the filter is successful. 

ft! 

For instance, 

e # <%12,?> do i:=i+l 

increments /'only if e is a tuple made of two elements, the first one being 12. 
20 This operator can be particularly useful in recursive filters. 

Using "do" it is possible, for instance, to extract all the numbers from a tree 
based on the previous example and append them to a list n. The extend filter 
A is 

25 rec F= <% 'plus' , F, F> or <%' minus ', F, F> or (<?x> do n:=n+[x]) 
In a basic example such as 

<'plus' , <'minus' , <1>, <2>>, <4>> # A 

where the tree encodes (1-2)+4, the pattern matching operation succeeds and 
30 n is equal to n+[1 ,2,4] after application. 
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The syntax and operational semantics of recursive filters and the "do" operator 
is described hereafter: 



/::=recF=/ | F recursive filter and recursion variable 

::= fdoe "do" operator (e is executed if / matches) 

5 As described above, these filters are reduced by preserving their structures. 
Reduction of the "/do e" filter differs from reduction of the "%e" filter because 
the embedded expression e is not computed. It will be evaluated in the 
context of the matching operation, and only if required: 

L—L fe-recl LllL [e-dol 

recF = /->-recF=/' 1 J / do e /' do e 1 J 

10 

Recursive matching is defined by using a substitution operation, as e.g. in the 

standard (3-reduction of the lambda calculus: 

r h v | f[rec F = i/F] -»TM- true/false 

Ty-vlrecF = f-^T\- true/false ^ > 



15 In the following [m-do], the evaluation of e is done only if the filtering operation 
is successful. In that case, the new context is used, thus allowing side-effects 
such as the memorization of useful intermediate information: 

F V v jf -» r h true r'he-»o h m n-ttjjf-j-r'l- false 

r H v f f do e T" f- true [m " d0] T h v f f do e V I- false [m " d ° 2] 



20 For illustration purposes, the technique described above is now applied to an 
example of managing a database of book references. The data is stored in a 
Set structure (unordered collection of records) describe hereafter. 

const DBase = { 
25 < year = ' 1994' , 

title = ' TCP/IP Illustrated', 

author = [ < last = 'Stevens', first = 'W.' > ], 
publisher = ' Addison-Wesley' , 
price = 65.95 

30 >, 

< title = 'Advanced Programming in the Unix environment', 
year = ' 1992' , 

author = [ < last = 'Stevens', first = 'W.' > ], 
publisher = 'Addison-Wesley', 
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price = 65.95 

< year = '2000' , 

title = 'Data on the Web', 

author = [ < last = 'Abiteboul' , first = 'Serge' >, 

< last = 'Buneman' , first = 'Peter' >, 

< last = 'Suciu' , first = 'Dan' > 

] , 

publisher = 'Morgan Kaufmann Publishers', 
price = 39.95 

>, 

< year = ' 1999' , 

title = 'The Economics of Technology and Content for Digital 
TV , 

editor = < last = 'Gerbarg' , first = 'Darcy', affiliation = 
'CITI' >, 

publisher = ' Kluwer Academic Publishers', 
price = 129.95 



Now, examples are given to extract, filter and recombine book information by 
using the filtering primitives and structured pattern constructors. 

For finding all books from Addison-Wesley that are published between 1 990 
and 1 999, and for storing the titles, a recursive function F1 is defined that also 
returns the result in a Set: 



function Fl (x) is 
30 if 

x#{ < title=?t,publisher=%'Addison- 
Wesley' ,year=%' 199' ++? > 
++?}++ ?y 

then 

35 return Fl (y) + { < title = t > } 

else 

return { } 

If F1 is called with the book database as parameter (written F1 (DBase)) the 
40 result is 



{< title = 'Advanced Programming in the Unix environment' 

> , 

< title = 'TCP/IP Illustrated' >} 

The rest of the information is not return in the result, e.g. the authors. The 
following variation, which is also perhaps more legible, does this job: 
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function Fl (x) is 
if 

(x#{?book) ++ ?y) and 
(book#< t itle=?t,publisher=% 'Addison - 
Wesley' ,year=%' 199' ++? > 
++ ?) 
then 

return Fl (y) + {book} 
else 

return { } 

This last example shows the interest of having explicit filtering operations: the 
first application is done on the parameter x and the second on "book". The 
equivalent solution below shows the interest of the "and" filter; the previous 
"and" is just a standard boolean connective. 

function Fl (x) is 
if 

x#{ 

(< title=?t,publisher=%'Addison-Wesley' , year=% ' 199 ' ++? 
> ++ ?) 

and ?book 
} + + ?y 
then 

return Fl (y) + {book} 
else 

return { } 

For finding all publishers known in the database, the filters are used in another 
construction: the "for" loop, has for instance the following syntax 

for f in el do e2 
where f is a filter, e, a computable expression that evaluates to a Set or a 
Sequence, and e 2 any programming statement. This loop applies the filter to 
all elements of the Set/Sequence, and if successful, executes e 2 . The solution 
is then: 

for <publisher=?p>++? in DBase do (R := R+{p}) 

This result of this program is found in R, with R being supposed to be empty 

before the execution: 

R={ ' Addison-Wesly' , 
' Addison-Wesley' , 
'Morgan Kaufmann Publishers', 
' Kluwer Academic Publishers'} 
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As can be seen, the Addison-Wesley publisher is repeated twice. One might 
want to have a real "projection" of the information, and for that, can use an 
additional filter: 

5 

for <publisher=?p>++? in DBase do ( 
if not (R#{%p}++?) then R:=R+{p} 

) 

10 The result of this program is as expected: 

R={ ' Addison-Wesly' , 

'Morgan Kaufmann Publishers', 
' Kluwer Academic Publishers'} 

15 For finding all authors having last names that begin with 'A', the following 

construction can be used: 

for <author=?s>++? in DBase do ( 

for <last= (%'A' ++? and ?auth) >++? in s do ( 
if not (R#{%auth}++?) then R := R+{auth} 

20 ) 
) 



The result will be {'Abiteboul'}. 



25 It will be appreciated that the presented pattern matching technique is suitable 
for either designing programming languages specialized in data structure 
transformation, or for facilitating extensions to existing languages in order to 
handle such transformations. The construction of data structures and pattern 
matching structures, i.e. filters, of arbitrary complexity uses symmetrical 

30 constructors, thereby allowing for great simplicity. The definition of basic 
filters, structured filters, recursive filters and logical combinators provides any 
programmer with means that have a high expressive power and a good level 
of simplicity. A boolean operator is provided that allows to apply explicitly the 
pattern matching structure to the subject data structure. Further, a formal 

35 operational semantics is provided that defines precisely the nature of filters 
and of the matching operation as well the relationship with any associated 
computational model. Moreover, this formalization offers a flexible theoretical 
framework that can help further integration. 
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BI-VALUATION OF PROGRAMMING STATEMENTS 



By means of the statement evaluation section 140, the computer system can 
perform the process which is depicted in FIG. 4. A programming language 
5 statement may include a first and a second sub-statement. In step 400, the 
first sub-statement is evaluated, and depending on the decision in step 410, 
the second sub-statement may likewise be evaluated. As will be shown in 
more detail below, statements exist that require the evaluation of the second 
sub-statement whereas on the other hand, there are also statements where 
10 evaluation of the second sub-statement is not necessary in each case. 

When the first and/or the second sub-statement is evaluated in steps 400, 420, 
an evaluation success result is determined if evaluation succeeds or a 
distinguished value is returned if evaluation fails. In the following discussion, 
15 the distinguished value is noted "none". 

The provision of a distinguished value in addition to the set of possible 
evaluation success results is called bi-valuation. 

20 The bi-valuation technique makes it possible to combine imperative and 
declarative statements in one programming language. Basically, both 
Imperative and declarative statements are evaluated when executed and 
either return "unit", a value, or "none". The value "unit" is returned for 
imperative statements since imperative statements always succeed. A value 

25 is returned for declarative statements which succeed, and for declarative 
statements which fail return "none". 

It is therefore possible to combine both kinds of statements using so-called 
imperative connectors, like "Then" or "Else" having semantics that are based 
30 on the evaluation of the statements. Conditional, imperative, declarative 
statements and imperative connectors are described below in more detail. 
Further, it is demonstrated how imperative connectors are used for mixing 
statements. 
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In the following, s and s, denote one or more imperative and/or declarative 
statements combined together by imperative connectors. The term 'to 
evaluate s' means 'to execute the statements in s and return the evaluation of 
the last statement in s'. Expressions like e, e, can be literal constants, 
5 variables or basic operations like e + e, e * e. 



t 



Conditional statements, or if-then-else statements, are noted 

if (e) then (si) else (s 2 ) 
where e is a boolean expression. Depending on whether e evaluates to true 
or false, s? or s 2 is evaluated, respectively. 

As opposed to declarative statements, imperative statements always succeed, 
and evaluate to "unit". One example is the assignment, noted x.=e, where x is 
a variable identifier and e is an expression. It assigns the result of the 
evaluation of e to x. Another example is the closure loop, noted *(s). The loop 
ends when s evaluates to "none". Its semantics is as follows: 

if (s != none) then * (s) else unit 

Declarative statements are based on conditions. The evaluation of a 
declarative statement returns a value if the condition(s) is verified, "none" 
otherwise. Examples are rules, ordered action systems and unordered action 
systems. 



25 The Rule is noted e -> s where condition e can be a boolean or a pattern- 
matching expression. If the condition is verified, meaning that the pattern has 
been matched or that the boolean expression has evaluated to true, then the 
rule fires, triggering the evaluation of the right hand side s. Rules are 
commonly used in rewriting systems by combining them in constructs similar 

30 to action systems. 

The (ordered) action system is noted [ | s h s 2 , s n | ] and can be compared 
with a powerful Switch construct. It is used to combine rules. The action 
system evaluates them one by one until it finds one that does not evaluate to 
35 "none", i.e. a rule which fires and that has right hand side statements that do 
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not evaluate to "none". The action system itself returns the result of the 
evaluation of these statements. Action systems can also contain an 
imperative statement in the last position which will act as a default case since 
imperative statements always return "unit". 

The unordered action system is noted { | Sj. s 2 , s n \ } and does not 
guarantee the order of evaluation of rules, i.e. the order of declaration has not 
semantic meaning. 

There are three imperative connectors which make it possible to mix 
imperative and declarative statements. An example set of imperative 
connectors is given below: 

The sequencing operation ";" separates instructions that are to be executed in 
sequence. 

s 1 ;s 2 ; ... ;s n 

will execute every statement and return the evaluation of s„. 

The concurrent operators || A and || v separate instructions that are to be 
executed concurrently. They have respectively logical-And-like and logical-Or- 
like semantics. Basically, s 1 |) A s 2 concurrently evaluates s 7 and s 2 returning 
"none" when at least one of the s,- evaluates to "none"; "unit" is returned when 
both operands evaluate to "unit", || v s 2 behaves in the following way: it 
concurrently evaluates S! and s 2 returning "unit" when at least one of the Sj 
evaluates to "unit", and returning "none" when both operands evaluate to 
"none". A more formal definition of the semantics is provided below. 

The Else operator s T Else s 2 evaluates Sj. If Si evaluates to "none", then s 2 is 
evaluated. The semantics is as follows: 

var v f = Si. if (v f == none) then s 2 else v f . 
Wis a fresh variable, i.e. doesn't occur in the current context nor in Sr or s 2 . 

The Then operator s? Then s 2 evaluates Sp If Sj does not evaluate to "none", 
then s 2 is evaluated. The semantics is as follows: 

if (Si ! = none) then (s 2 ) else (none) . 
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The Or operator s 1 Or s 2 evaluates Sp If s 1 evaluates to "none", the evaluation 
of s 2 is returned. If s, evaluates to "unit", s 2 is still evaluated but "unit" is 
returned, no matter what s 2 evaluates to. The semantics is as follows: 

if (si == none) then (s 2 ) else(s 2 ; unit). 

The And operator s 1 And s 2 evaluates s ? . If s 7 evaluates to "unit", the 
evaluation of s 2 is returned. If s? evaluates to "none", s 2 is still evaluated but 
"none" is returned. The semantics is as follows: 

if (Si ! = none) then (s 2 ) else (s 2 ; none) . 

For operators And, Or, || A and || v , s? and s 2 have to evaluate to "unit" or "none", 
meaning that they have to be imperative expressions. 

Some introductory examples are now provided to get familiar with the notation, 
and then the expressiveness brought by the technique is demonstrated with 
more complex examples. Variables / and strl are supposed to be respectively 
of type Integer and String. 

A simple sequence is made of an assignment statement followed by a "while" 
loop containing a rule: this loop increments / until it reaches 10. 

i :=0; 

* ( (i < 10) -> i := i + 1) 

In an action system containing a rule, a nested action system is made of rules 
and an expression: 

tl 

(i<5) (strl := 'cir' ; 50-3), 
CI 

(i == 100) -> 'abed' , 
(i == 200) -> 'efgh' 

IK 

3.1415 

13 
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If / is less than 5, then strl is assigned the string 'cir' and the action system 
returns 47. Otherwise, if / is equal to 100 or 200, the action system 
respectively returns strings 'abed' or 'efgh'. Otherwise, it returns 3.1415. This 
last expression acts as a 'default' case. If it were not present and if / was not 
5 previously assigned a value less than 5 or equal to 100 or 200, the action 
system would have evaluated to "none". 



It is to be noted that action system components do not necessarily contain a 
simple expression like 50-3 or 'abed' as the last statement. They can also 
contain imperative expressions; in that case, the action system returns "unit". 

The next more complex example makes use of the pattern matching features 
described above. Basically, pattern matching expressions can be used in 
place of boolean expressions in the left hand side of rules; they enable the 
recognition of patterns and extraction of parts of a data structure. A pattern 
matching expression evaluates to true if the subject matches and then the rule 
fires. Otherwise it evaluates to false and the rule returns "none". 

In the present example, some structural data about an XML file stored in string 
variable strl are extracted. To this end, several numeric variables are created 
which will count: 



nb_openJags number of openings tags 

nb_close_tags number of closing tags 

nb_emp-tags number of empty tags 

nb-prefixed_tags number of prefixed tags (the prefix represents a 
namespace) 

nb_xrce_Jags number of tags having the name 'xrce' 

nb_other_tags number of tags which do not have a prefix and are 

not labeled 'xrce' 



The first three tests identify the kind of the encountered tag, i.e. opening, 
25 closing, empty, whereas the three others examine the content of that tag. 
Since the last three tests are independent from the previous ones and relevant 
only when a new tag is encountered, the tests are split in two action systems 
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combined with a Then connector. The variable tags is a string in which each 
tag name is appended. When all tags have been tested, the results are 
displayed. 



*( 

strl #?++%' <' ++ ?str2 ++ %' >' ++ ?strl -> 

nb_open_tags -.= nb_open_tags+l, 
strl #?++%' </' ++ ?str2 ++ %' >' ++ ?strl -> 

nb_close__tags := nb_close_tags+l , 
strl #?++%' <• ++ ?str2 ++ %'/>' ++ ?strl -» 

nb_emp_tags := nb_emp_tags+l 

13 

Then 
CI 

Str2 #?++%' :'++?-» 

nb_pref ixed_tags := nb_jpref ixed_tags + 1, 
str2 # %' xrce' — > nb_xrce_tags := nb_xrce_tags + 1, 
nb_other_tags := nb_other_tags + 1 

II 

Then 

(tags := tags + str2) 

) ; 

print (' number of opening tags :' +str (nb_open_tags) ) ; 

print (' number of closing tags :' +str (nb_close_tags) ) ; 

print (' number of empty tags +str (nb_emp_tags) ) ; 

print (' number of prefixed tags :' +str (nb_pref ixed_tags) ) ; 

print (' number of xrce tags :' +str (nb_xrce_tags) ) ; 

print ( 'number of other tags :' +str (nb_other_tags) ) ; 

print (tags) 



In this example, an action system, i.e. a declarative statement, is composed 
with another action system and with an assignment instruction, i.e. an 
35 imperative statement, and the three are nested in a closure loop, i.e. an 
imperative statement. 

To illustrate the compactness and readability provided by the proposed 
technique, the same program coded with only the closure loop and if-then-else 
40 statements is presented in the following. For simplicity purposes, it is 
assumed that boolean expressions in conditions can be replaced by pattern- 
matching expressions. 



*( 

45 if (length (strl) > 0) then ( 
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if (strl #?++%' <' ++ ?str2 ++ %' >' ++ ?strl) then ( 
if (str2 #?++%' :' ++ ?) then ( 
nb_open_tags : = nb_open_tags + 1 ; 
nb_pref ixed_tags := nb_pref ixed_tags + 1 

) 

else { 

if (str2 # %' xrce' ++ ?) then ( 
nb_open_tags := nb_open_tags + 1; 
nb_xrce_tags : = nb_xrce_tags + 1 

) 

else ( 

nb_open_tags := nb_open_tags + 1; 
nb_other_tags := nb_other_tags + 1 

) 

) ; 

tags := tags + str2 

) 

else ( 

if (strl #?++%' </' ++ ? s tr2 ++ %' >' ++ ?strl) then 

if (str2 #?++%' :'++?) then ( 
nb_close_tags := nb_close_tags + 1; 
nb_pref ixed_tags := nb_pref ixed_tags + 1 

) 

else ( 

if (str2 # %' xrce' ++ ?) then ( 

nb_close_tags := nb_close_tags + 1; 
nb_xrce_tags : = nb_xrce_tags + 1 

) 

else ( 

nb_close_tags := nb_close_tags + 1; 
nb_other_tags : = nb_other_tags + 1 

) 

) ; 

tags := tags + str2 

) 

else ( 

if (? ++ %' <' ++ ?str2 ++ %' />' ++ ?strl) then ( 
if (str2 #?++%' :'++?) then ( 
nb_emp_tags := nb_emp_tags + 1; 
nb_pref ixed_tags := nb_pref ixed_tags + 1 

) 

else ( 

if (str2 # %' xrce' ++ ?) then ( 
nb_emp_tags : = nb_emp_tags + 1 ; 
nb_xrce_tags := nb_xrce_tags + 1 

) 

else ( 

nb_emp_tags := nb_emp_tags + 1; 
nb_other_tags := nb_other_tags + 1 

) ~~ ^" 

) ; 

tags := tags + str2 
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) 

else (none) 

) ; 

print (' number of opening_tags :' +str (nb_open_tags) ) ; 
5 print (' number of closing_tags :' +str (nb_close_tags) ) ; 
print (' number of empty_tags :' +str (nb_emp_tags) ) ; 
print (' number of pref ixed_tags :' +str (nbjpref ixed_tags) ) ; 
print (' number of xrce_tags :' +str (nb_xrce_tags) ) ; 
print ('number of other_tags :' +str (nb_other_tags) ) ; 
10 print (tags) 

Using if-then-else statements, 3x3 = 9 cases have to be dealt with separately. 
In languages like C or Java, a Switch construct would have produced a more 
readable solution, but unfortunately, Switch can only be used with primary 
15 integer and char/byte types, and test values have to be statically defined. 

It will be appreciated that the above described technique enables the definition 
of basic imperative and declarative statements, the combination of both kinds 
of operations in meaningful and arbitrarily complex control structures, and the 
20 typing of such constructions. 

Main applications are language construction, refining control abstractions and 
programmation models toward more computational expressiveness, rewriting 
systems theory, classical conditional term rewriting systems, explicit 
25 strategies, transformation languages and systems for compilation and 
language processing, structured document transformation, tree pattern 
matching, event management, explicit or automatic document transformation 
systems, and others. 

30 The technique provides material for building languages which are less abstract 
and more general than rewriting systems but abstract enough to simplify and 
extend general purpose programming languages. The technique could hence 
play an important role in the design of new transformation techniques or the 
extension of existing ones. Further, the technique may solve data and 

35 document interchange problems which will lead to low costs and high quality 
solutions. 

Moreover, the technique provides a programming language framework through 
the definition of generic syntactic constructs, their operational semantics and 
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the associated formal type system. The important difficulties which underlie 
the ambitious goal of bringing together declarative and imperative 
programming styles are solved all together, in a conceptually new and generic 
approach. 

5 

It will be appreciated that the bi-valuation technique enables and complements 
the symmetrical structural pattern matching technique that has been described 
above. The technique is further particular suited for supporting general typing 
mechanisms. Type checking improves the global reliability of programming 
10 languages, and provides efficient means for optimization, thus, increasing the 
runtime performance. This will be described in more detail below. 

By means of bi-valuation of programming statements, and by using imperative 
connectors, the design of new programming languages or the extension of 

15 existing programming languages are rendered possible with constructions that 
provide innovative expressiveness. Resulting languages are located at an 
intermediate level of abstraction, between declarative, functional and 
imperative languages. This kind of abstraction could play an important role in 
general structure transformation technologies. When building modern 

20 programming languages with adequate control abstraction level, the major 
difficulty is to find the best trade-off between simplicity and power, conciseness 
and expressiveness. It will therefore be appreciated that the present 
technique provides programming constructs in such a way that the constructs 
offer compositional capabilities. Further, connectors are provided that allow 

25 the composition of such constructs with relevant semantics. Further, a sound 
type system is provided which enables to check out composition errors. 

Statement Evaluation Syntax and Operational Semantics 

30 In the following, the syntax and semantics of the basic conditional and 
imperative constructs are described in more detail. 

In the testing operation, environment modifications are allowed during 
evaluation of the boolean part. This enables the use of complex operations 
35 with effects on context as a condition for the test: 
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r h if ex then e 2 else e 3 T' I- if e' x then e 2 else e 3 L J 
if true then e 2 else e 3 -» e 2 [ifl] 
if false then e 2 else e 3 -}■ e 3 [if2] 



The sequencing operator is defined as shown hereafter: 
r f- ei i* l- «5 . fl ei -*> v 
rt-e i; e 2 ->Phel;e 2 ^ i^T^ M 



Futher, an assignment operator is provided that finally reduces to a 
distinguished value "unit". In the prior art, similar mechanisms for typing 
imperative statements exist for integrating imperative extensions in the 
functional framework. However, the result of such an assignment operation x 
:= e is just the result of the expression e itself: 

r H e V f- e' e -*> « 

rr-.^c-* i*i-»:=e' [assgl] r, F g a 5 r, 5= P unit [assg2] 



Moreover, an iterative construct is provided in the form of a closure operator, 
noted *(e), which allows to iterate an action e until it evaluates to a 
15 distinguished value "none". Thus, this action must be a declarative action, 
since applying the closure to a pure imperative action like an assignment 
would produce endless computation. This property is checked by the typing 
rules described below. The semantics of the * operator is simply defined by 
*(e) -» if (e! = none) then *(e) else unit [*]. 

20 

A closure operation, returning always the distinguished value "unit", is 
considered as a pure imperative (mono-valuated) statement. 



Declarative constructs are defined as shown hereafter: 

25 

e ::= e=> e' rule 
[ | e-i, e„ | ] action system (ordered) 

{ | e 1t e n \ } action system (unordered) 
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As mentioned above, the first universal construction is the rule, constituted of a 
left-hand condition and a right-hand action executed only if the condition can 
be verified. The proposed syntax, and the following type system, allows the 
cascading of rules of the form e 1 => e 2 => e 3 , understood as e<i (e 2 e 3 ): 
5 e 1 e 2 -» if {e 1 == true) then e 2 else none [Rule] 



In declarative environments, such rules are usually assembled in larger 
systems. The rule choice and application strategy varies a lot depending on 
the various proposed models, but most common tactics are "try rules in order" 
or "try and rule" and "iterate application as much as possible". These standard 
models, and more sophisticated tactics as well, are easily emulated by 
combining the action systems with the closure operator presented above. 



The ordered action system consists of a succession of actions to be tried in 
the given order. In the following definition, v { denotes a "fresh" variable, i.e 
with a unique name: 



2>2 [\e u ...,e n []-> 



\arvf ~ ei. 

= none) . . 
then[K... f 6 n |] [AsyslJ 



[|e|]-»e [Asys2] 



It is to be noted that this definiton implies that all actions e u e n .i must be bi- 
20 valuated, i.e possibly evaluated to "none"; the last one can be mono-valuated. 
In that case the whole system becomes itself mono-valuated, since a value 
different from "none" will be finally returned. 

Unordered action systems as similarly defined, up to the (random) choice of 
25 action e-, among all others: 

" varv/ = e,-. 

»>2, i£{1 ,..,»} <l«> *l>- "Si.™'.*....*. «d> [ASySM! 

{|e|} -+ e [Asysb2] 
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One consequence of this definition is that here, all actions must be bi-valuated, 
in the sense given above. Such an action system specifies permutation 
equivalent rule collection, allowing great perspectives for run-time 
optimizations. Moreover, it allows the programmer to avoid producing over- 
5 specified code, making the code more understandable to external code 
reviewers or maintainers. 

Turning now to the imperative connectors, the terminology "imperative 
connector" has been chosen by reference to the so-called "Boolean 
10 connectors", in order to underline the similarity. In both cases, values are bi- 
valuated. But the present, more subtle approach to the valuation domains 
allows to handle several data sets. 

The first useful connector is the well-known sequence ";" already presented. 

15 

Another useful set of operators are concurrent execution (binary) operators, 

noted e, || A e 2 and e r || v e 2 , and also ei ||* e 2 for both, equipped with the 

following interleaving semantics: 

r b ei -> V h e[ rhe 2 ->r' b e' 2 

T h ei \\*e 2 -» P I- e' 2 ||*e 2 lFarLJ rf- ei Ife^fhe, ||V 2 [ParR] 

20 

The first operator || A , besides its concurrent semantics, behaves like a logical 
"and", commonly noted a, with respect to the management of the bi-valuation: 

unit || A e 2 e 2 [Parla] none || A e 2 -> e 2 ; uone [Parlb] 
ei |f none -> e a ;none [Parle] ei || A unit d [Parld] 

25 With || v , the equivalent "or" behavior, commonly noted v, is expressed: 

unit || v e 2 -+ e 2 ;unit FPar2a] none || v e 2 e 2 [Par2b] 
e! || v none->ei [Par2c] e x jfurat-* ei;unit [Par2dJ 

This definition describes a simple concurrent semantics, that can be extended 
to a wider scope. The important point here is to define this concurrent 
30 composition as a bi-valuated statement consistent with the global framework. 
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Besides concurrent imperative connectors, there are also provided sequential 
imperative connectors, Then, Else, And and Or. 

Then and Else are defined by: 

e t Then e 2 -> if (e r != none) then e 2 else none [Then] 

e 1 Else e 2 var Vf=ei. if (v f == none) then e 2 else v f [Else] 

W is a fresh variable, i.e. doesn't occur in the current context nor in e 1 or e 2 . 
The following And and Or operators are quite similar, except that they impose 
the evaluation of both operands whatever result is computed. Another 
difference is that values computed by both operands can only range over {unit, 
none}. In that sense, e 1 and e 2 are imperative expressions: 

e 1 And e 2 -> if (e 1 != none) then e 2 else (e 2 ; none) [And] 
e-i Or e 2 -> if {e 1 == none) then e 2 else (e 2 ; unit) [Or] 

Type System 

A minimal, formal, type system that illustrates and enables the use of the 
constructs is described hereafter. This type system is based on a subtyping 
relation using inclusion polymorphism brought by a non-discriminating union 
type constructor. An example of a type hierarchy is shown in FIG. 5. 

The type system introduces other relevant tools, when required, to known 
techniques for building type systems. It is described through "typing 
judgments" logically organized by means of "typing equations". 

The typing judgement y > e : t states that expression e has type t in the typing 
context y. The notion of y > 1 4 t ' asserts that type t is a sub-type of type t' and 
a more original judgement y > e :: t, says that the expression e has minimal 
type t. Formally, this last relation is characterized by the following property: 

(Minimal type) 7 > e :: t iff W, 7 > e : t' j>t=$t' 
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This notion of minimal type is useful in order to avoid overgeneralized 
inferences due to the so called subsumption rule (see [Sub]), while preserving 
the convenience of subtyping mechanism. The notation y, x : t> ... expresses 
that the fact that "x is a variable of type f is registered in the typing context. 
More precisely, y, x : t is a typing context ? such that / = yu {xr}. The 
following rules define the reflexive and transitive subtyping relation =4, for all 
well-formed types t, f, U, t 2 , t 3 and the typing context y. 

7>tH t [Refl] 7>^fa jgz*. 

7 t> «1 =$ *3 

l^ll [Min] ■> >e:t 2^ [m 

7t>e:* lJ 7>e:t' LJ 

7±h*l rui] I****' ru2] 

T > *l 1*2 ^ t 1 J J>t^t 1 \t 2 1 J 

7> *^ 2 [U3] 



10 Rules for defining well-formedness conditions on types will not be shown, and 
types are supposed well formed in all following definitions. An important type 
constructor is the non-discriminating union which provides inclusion 
polymorphism. 

15 Basic typing of literals and variables is defined by the following axioms: 
0 > n :: int [Num] 0 > s :: string [Str] 7, x :tt> x :: t [Vac] 
0 > true :: bool [Booll] 0 1> false :: bool [Bool2] 
0 > none :: None [None] 0 > unit :: Unit [Unit] 



Operators are typed by: 

*e {+,*,-,/} 

7 1> e t : int 7 [> e 2 : int 7 > ei : string 7 t> e 2 : string 

7>ei*e 2 ::int 7 1> e t + e 2 :: s*- 5 "" ^ 



7 > ei : t 7 > e 2 : g 7> ei :t 7 >e 2 :f 

7 > ei ==e 2 ::bool ^ 7 > Cl ! = e 2 :: bool [Neq] 

7 t> e : bool 7 > e t :: 7 E> e 2 :: t 2 x g 0*0111(7) 7j>e],:t 7, ei :t > e 2 :: t 2 

7 0ifethene! elsee 2 ::*i|* 2 11 7 > var ar = ei.e 2 :: f 2 ^ ^ 
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It is to be noted that [Eq,Neq] do not impose operands to have the same 
common type. That is, equality comparisons are applicable to every kind of 
data. 



10 



5 Other basic constructs can be typed through: 

7 > ex : UnitjNone 7 t> e 2 :: ti .„ , 7>i::t 7 > e : i _ _ 7 > e :: UnitjNone . , _ 
y>e l ;e 2 ,t 2 ^ 7 > * := e =: Unit [ASSg] 7 >*(*):: Unit ^ 

Concurrent connectors are always bi-valuated imperatives: 

[Par And] [ParOr] 
7 > ei :: Unit|None 7 O e 2 : Unit|None 7 t> ei :: Unit|None 7 > e 2 :: UnitjNone 
7 > e x || A e 2 " Unit|None 7 > e x ]| v e 2 " Unit|None 

The following equations achieve very precise type control for "and-like" 
concurrence: 

7>«i=W| T ^:M d 
7>ei || A e 2 ::Unit 1 1 

7 > ei : None 7 > e 2 : UnitlNone _ . 

— [ParAnd2] 

7 > ei j| A e 2 :: None 

15 Similarly, for "or-like" concurrence: 

7 > ei : None 7 > e 2 : None 
7>ei j| v e 2 ::None 

7 > ei : Unit 7 > e 2 : UnitjNone 7 > e x : NonejUnit 7 > e 2 : Unit 
7>«i || v e 2 ::Unit [Par0r2] 70 a, || v e 2 ::Unit [P&rQr3] 

Sequential connectors are defined by, for some t, V: 

7 j> e 1 :: UnitjNone 7 > e 2 :: t 7 > ei :: t|None 7 > e 2 :: t' 

7 > ei Then e 2 :: *|None L ™ enj 7 >e 1 EIsee 2 iTTp P * 1 

7 > ei :: Unit|None 7 > e 2 :: UnitlNone 7 > ei :: UnitjNone 7 > e 2 :: UnitjNone 

7 > et And e 2 :: UnitjNone 1 J 7 >eiOre 2 ::Unit|None [0r] 

20 

Beyond these last two equations, as for concurrent connectors, much more 
precise type checking is provided in order to detect composition errors: 
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7>ei::None 7 > e 2 : Unit|None 7 1> ei : Umt]None 7>e 2 ::None 

- — - — [Anal I — — lAjidzl 

7 > ei And e 2 :: None 7 t> ei And e 2 :: None 



7 > e x :: Unit 7 f> e 2 " Unit • 
7 !> d And e 2 :: Unit L 



7 > e x :: Unit 7 > e 2 : Unit|None 7 > e t : Unit|None 7 > e 2 :: Unit 

7 t> ei Or e 2 :: Unit 7 > ei Or e 2 :: Unit 

7 > e x :: None 7 > e 2 :: None 

_ _ — ; I Or 31 

7 > ex Or e 2 :: None 



The relevance of the typing approach described above with respect to the 
semantics of the language constructs described above is now demonstrated in 
5 more detail. 

The soundness property is to prove that a well typed construct cannot produce 
an error during evaluation. More precisely, it establishes the conservation of 
the well typed property: any well typed expression remains well typed over all 
10 possible computation steps. One preliminary step is to define a logical relation 
between the typing context and the execution environment: 

The execution environment r is conform with respect to the typing context y 
(noted ylhT): 

15 ftt-T iff Va" 6T, yt>x::t A ®>v::t 

In the following, e is understood as a non-reduced syntactic expression. The 
type is preserved during reduction: 



Vi , 7, T 7 lh r and 7 > e : * 



7 > e' : t 
7IHT' 



The proof presents no particular difficulty; it uses an induction on the structure 
of e. 

The completeness property states that all derivations, i.e. computation steps, 
defined by the operational semantics is covered by the type control 
mechanism. Completeness is defined by: 
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( f>e:t 

Ve,7,rs.t.7ll-r rhe->r'he' 3t such that \ 7 t> e' : t 

As for the preservation of type, the proof presents no particular difficulty; it 
uses an induction on the structure of e. 

5 

Using the foregoing specification, the invention may be implemented as a 
machine (or system), process (or method), or article of manufacture by using 
standard programming and/or engineering techniques to produce 
programming software, firmware, hardware, or any combination thereof. 

10 

Any resulting program(s), having computer- readable program code, may be 
embodied within one or more computer-usable media such as memory 
devices or transmitting devices, thereby making a computer program product 
or article of manufacture according to the invention. As such, the terms "article 
15 of manufacture" and "computer program product" as used herein are intended 
to encompass a computer program existent (permanently, temporarily, or 
transitorily) on any computer-usable medium such as on any memory device 
or in any transmitting device. 

20 The invention has been described with reference to particular embodiments. 
Modifications and alterations will occur to others upon reading and 
understanding this specification taken together with the drawings. The 
embodiments are but examples, and various alternatives, modifications, 
variations or improvements may be made by those skilled in the art from this 

25 teaching which are intended to be encompassed by the following claims. 
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